The impact of weather conditions on the inflow of refugees in Europe

1. Introduction

Since September 2015, the British Red Cross (BRC) developed a number of projects to anticipate the number of refugee arrivals and their flows in Europe. This project was commissioned to investigate the effect of weather conditions on refugee arrivals. Our focus is on the final part of refugees journey to Europe (by boat) and the impacts of weather conditions on their sea trip.

Our aim is to explore if we can make reasonably good predictions based on very little data that can be of help to policy makers, NGOs and other interested parties. For these groups, it’s necessary to understand the dynamics behind weather conditions and refugee arrivals. Policy makers either want to hinder refugees (make it more difficult for them to arrive) or help them (make sure that they’re not stuck in places where they might die). NGO’s want to provide better services. The EU wants to make sure that refugees don’t travel when it’s bad weather (they do, and many die).

This document is structured as follows. In section 2, we outline our research questions and project goals, followed by an overview of our approach in section 3. In turn section 4 discuss about the background research and derive from that our theoretical variables. Section 5 represents an overview of the number of refugee arrivals to Greece and Italy. In section 6 we explain the process of data collection which is mainly focused on weather- related data. Then we perform an exploratory data analysis on available data. In section 8, we construct several models aimed at finding the most relevant variables. Finaly, we summarize our results and suggest for follow-up projects.

2. Problem Statement and project goals

The project goals are as follows:

  • Gain a better understanding of UNHCR refugee arrival data. That is, try to better understand the limitations of the dataset.
  • Analyze the relationship between weather status and the number of arrivals to Europe to see if we can make somewhat accurate predictions about number of arrivals
  • Identify relevant predictors

The sub-questions which lead to the main goals are:

  • How is the quality of data?
  • How strong, if any, is the relationship between weather status and the number of arrivals to Europe?
  • In case of any relationship, which weather parameters impact the number of refugee arrivals the most?

3. Approach

To tackle this problem we create two models. The first model contains theoretical variables. That is, it contains variables that are identified as important through our investigation about weather conditions that may have encouraging/ discouraging impact on the flow of refugees. The second model performs variable selection on our dataset and hence selects best predictors . This serves as a black box model; if it turns out that our theoretical variables are not related to the number of refugees, it indicates that we should consider an alternative approach .

4. Background and theoretical variables

In this section, we discuss the results of our research on weather- related parameters and other important factors that theoretically influence the number of refugee arrivals.

4.2. Other important features in refugees inflow to Greece

In our study we found a difference in the impact of weather conditions on refugees inflow to Greece and Italy (see Exploratory data analysis section). Many presumed that the influx of refugees to Greece would decrease in winter 2015 due to a high risk of journey across the Aegean from Turkey to Greece. However, evidences show that there are more influencing factors than weather conditions on the number of refugee arrivals. Some of these factors are as follows:

  • Poor weather means a discount

Many anecdotes from refugees declare that traffickers keep the demand high by offering seasonal discount in winter8 which causes people risk their lives at sea. There are also some quotes from refugees that the smugglers don’t travel with them; they sometimes hold back a family member as a hostage, to get a refugee to bring the boat back.9

  • Risk of closing borders

According to Carlotta Sami, spokeswoman at UN refugee agency for southern Europe, the surge in numbers of refugees in bad weather is due to their fears that European borders will close once they reach northern Europe10.

  • Turkey Deal

Following the EU-Turkey agreement signed on 18 March 2015 a sharp reduction of the illegal migration flows was seen11. This agreement which was aimed to end the irregular migration from Turkey to the EU, was successful in tackling this issue to a large extend.

4.3. Summary

Figure1. represents the factors with the most impact on the number of refugee arrivals. Thickness of the lines show their approximate weights. Positive and negative signs show whether a factor causes an increase or decrease in the number of refugee arrivals. According to available documents mentioned earlier, factors with no relation to weather conditions, such as discounts by traffickers as well as risk of closing borders in winter time were more influential. These factors had more encouraging impact on the flow of refugees rather than intercepting impact of bad weather.

Figure 1. Summary of different variables on the number of refugee arrivals

Figure 1. Summary of different variables on the number of refugee arrivals

5. An overview of refugee arrivals to Greece and Italy

Analyzing dataset of the number of arrivals to Greece from October 2015 - June 2016, approves our findings of our research. As it was expected, in winter 2015, the refugee influx did not get slower due to bad weather. On the other hand, the impact of EU-Turkey deal is clearly visible in the decreased number of arrivals from March 2016.

Figure2. Distribution of the number of refugee arrivals to Greece by temperature and month

Figure2. Distribution of the number of refugee arrivals to Greece by temperature and month

When it comes to refugee arrivals to Italy, as it can be seen in Figure3. there is a pattern in the number of arrivals in 2015 and 2016. It seems this pattern is more influenced by weather conditions.

Figure3. Distribution of the number of refugee arrivals to Italy by year and month

Figure3. Distribution of the number of refugee arrivals to Italy by year and month

How to proceed?

As a conclusion, we decided to further our analysis about the impact of weather conditions on the number of arrivals using Italy dataset. Furthermore, this analysis is focused on the period between Oct 2015 and September 2016.

6.Data collection

There is a dataset and an API available for this analysis, a dataset of the number of arrivals published by the United Nations High Commissioner for Refugees (UNHCR)12 and a weather forecast history API, Dark Sky API 13. Dark Sky API returns the observed hour-by-hour and daily weather conditions for a particular date in a given area.

In this study our focus is on finding the weather conditions that may encourage/ discourage refugees to start their journey, rather than conditions that may influence traveling time. Therefore, we need to identify coastal migrant hubs and the traveling time from those hubs to destinations (in this case Italy).

According to Zeit-Online 14, main refugees are coming from Libya, Tunisia and Egypt. In Libya, Tripoli 15 and Benghazi 16 are the main ports from which it takes at least 3 days 17 to Italy. Traveling from Egypt ,Kafr-al-Sheikh and Alexandria, to Italy takes more than 10 days and there are evidences of 13 days on board.

Figure 4. Hubs to Italy

Figure 4. Hubs to Italy

As you can see in Figure4. Tripoli is the largest migrant hub to Italy. For sake of simplicity, we start with analyzing Tripoli data assuming that it is a 3-day trip.

6.1. Weather API data points

Considering the weather parameters mentioned above, the followings are the related variables available in Dark Sky API. These variables are theoretically the most influencing factors on the number of refugee arrivals.

  • Wind Speed : The wind speed in miles per hour

  • Wind Bearing : The direction that the wind is coming from in degrees, with true north at 0° and progressing clockwise.

  • Moon Phase: a value of 0 corresponds to a new moon, 0.25 to a first quarter moon, 0.5 to a full moon, and 0.75 to a last quarter moon.

This variable is related to tides. Tides are known as the patterns of rising and falling the sea level caused by the gravitational forces of the Sun and Moon. Because the tides are influenced by both the Moon and the Sun, it’s easy to see that when the Sun lines up with the Moon and the Earth, as during a New Moon or Full Moon, the tidal effect is increased, called spring tides . On the other hand, during the First or Third Quarter Moon, the Sun and the Moon are 90 degrees apart in relation to an observer on Earth. So gravity of Sun works against the gravity of Moon and the height of the tides are in minimum, called neap tides. The interval between spring and neap tides are about seven days.

  • Precipitation Type: The type of precipitation occurring at the given time. If defined, this property will have one of the following values: “rain”, “snow”, or “sleet” (which refers to each of freezing rain, ice pellets, and wintery mix ).

  • Apparent Temperature Max/ Min: Maximum and minimum of apparent temperatures in Fahrenheit.

  • Humidity: The relative humidity, between 0 and 1, inclusive.

7. Exploratory data analysis

7.1. Data Quality

7.1.1. UNCHR data

Quality of data is evaluated in terms of arrival numbers registered per day. As it can be seen in the following diagram, in a large proportion of this dataset (more than 40% of days), no refugee is registerd in Italy. Although there is a possibility of missing data in those days (data error), because of the following reasons we take zeros as valid data points.

  1. We can consider a pattern in zero/ close to zero data points. In cold months of the year, days with very limited number of arrivals are much more frequent than other months.

  2. It is a 3-4 days sea-trip from Libya to Italy. While in some days ships arrive with large number of refugees, in other days no ship arrives.

7.1.2.Dark Sky API data

Regarding the quality of data extracted from the weather forecast history API, it includes some columns with more than 50% null values.Observations that include null values will be ignored by machine learning algorithms. So we will loose a large amount of data in case of keeping these columns. On the other hand, imputing missing data with replacement values such as mean values is not an accurate approach due to the large percenatage of missing data in each column. Therefore, we decided to remove these columns. In this case we miss some predictors in favour of having more observations and more accurate result.

7.2. A snapshot of all predictors

Here is a snapshot of available variables after cleaning data.

Date Arrivals.to.Italy icon1 moonPhase1 apparentTemperatureMin1 apparentTemperatureMax1 temperatureMin1 temperatureMax1 cloudCover1 dewPoint1 humidity1 windSpeed1 windBearing1 visibility1
2 2015-10-02 0 partly-cloudy-night 0.1800151 0.4969908 0.3525569 0.4956943 0.4562215 1.8851405 0.9057322 0.2496739 0.9019012 -0.7569944 0.2577438
3 2015-10-03 128 partly-cloudy-night 0.3204181 0.4591486 0.5152885 0.4574580 0.5788080 1.0092320 0.9366248 0.4325580 1.2064041 -0.6487149 0.9797342
5 2015-10-05 0 partly-cloudy-night 0.5661234 0.9510977 0.4068008 0.9545307 0.4562215 0.4739546 1.1322782 0.6154420 0.7922802 -0.8382040 0.7775769
6 2015-10-06 2001 partly-cloudy-night 0.7065264 0.8186498 0.6237763 0.8207034 0.4562215 1.5445094 1.5338825 1.4384200 -1.2052586 -0.9555067 1.0952527
7 2015-10-07 1010 partly-cloudy-night 0.8118287 0.9510977 0.6961015 0.9545307 0.4562215 1.3498631 1.6574531 1.7127460 -0.4866319 -1.4878806 -0.4642466
8 2015-10-08 0 partly-cloudy-night 0.9171309 0.7618865 0.9854022 0.7633489 1.1181886 0.0846620 1.3073365 0.7068840 -1.4488609 0.9845000 -0.0888116

8. Modeling

8.1. Evaluation Method

Our model evaluation is based on k-fold Cross Validations (CV). We use an average of k (in our case the number of folds is equal to 10) Root-Mean Square Error (RMSE) as evaluation metrics for our models.

8.2. Linear Models

8.2.1. Linear model using theoretical variables

In the eraly stage of this project we investigated about the weather parameters that may impact inflows of refugees. Here, we create a simple linear model using variables that are considered as the most important ones in that theoretical analysis. This model includes 5 predictors including wind speed, wind bearing, moon phase, minimum apparent temperature, and humidity.

The results are as follows: Theoretical Model RMSE is equal to 774.47. Summary of this model (see below) shows in most of the cases standard errors are relatively high and in some cases even larger than their coefficients. It could be due to a large collinearity among variables. Other reasons could be that the coefficients are not well estimated or the coefficients are close to zero.


Call:
glm(formula = Arrivals.to.Italy ~ ., data = libya.data[, columns.theory])

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-775.0  -426.6  -221.6    71.9  4743.6  

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)               439.19      42.99  10.215  < 2e-16 ***
windSpeed1                -72.40      45.43  -1.594 0.112040    
windBearing1              -17.30      45.53  -0.380 0.704305    
moonPhase1                 60.49      43.18   1.401 0.162211    
apparentTemperatureMin1   175.88      47.01   3.742 0.000218 ***
humidity1                  40.59      43.87   0.925 0.355555    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 589658.9)

    Null deviance: 196500360  on 318  degrees of freedom
Residual deviance: 184563241  on 313  degrees of freedom
AIC: 5151.9

Number of Fisher Scoring iterations: 2

The following table shows the correlation among the predictor variables. It seems none of the variables are highly correlated.

windSpeed1 windBearing1 moonPhase1 apparentTemperatureMin1 humidity1
windSpeed1 1.0000000 0.0717517 0.0467218 0.2601069 0.1836955
windBearing1 0.0717517 1.0000000 0.0316937 0.3195488 0.0348175
moonPhase1 0.0467218 0.0316937 1.0000000 0.0418494 0.0092763
apparentTemperatureMin1 0.2601069 0.3195488 0.0418494 1.0000000 0.0099789
humidity1 0.1836955 0.0348175 0.0092763 0.0099789 1.0000000

conclusion

So, we can say that either the variables are not relavant (specially ‘WindBearing’ and ‘humidity’ with std. Error larger than estimate) or linear model is not a descriptive model for this dataset. In the next steps we make two other models: 1. a linear model using all available variables and 2. a linear model using selected variables by a Lasso model. Then we compare all three models measuring their RMSE.

8.2.2. Linear model with all variables

Here we make a simple linear model using all available predictors and compare its RMSE with the one from theoretical variables.

As a result, linear Model RMSE is equal to 760.554

8.2.3. Lasso Model

Lasso model performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. We applied cross-validation on our dataset to find the optimum tuning parameter, \(\lambda\), and make a lasso model. After shrinking the size of predictors using lasso model, we make a linear model and evaluate the model by measuring RMSE.

As the following diagram shows, increasing \(\lambda\), does not cause a considerable decrease on MSE. Consequently, shrinkage variable is not significant. It could be an evidance for irrelevant / missing variables. For example, because of the selected region (Tripoli, Libya) or selected length of sea trip from Tripoli to Italy (current assumption is 3 days).

Mean square error of a linear model made by Lasso-selected variables is 751.589. Large MSE, i.e. 600,000 refugees may represent a biased model that needs an improvement.

The selected predictors are as follows:

visibility1
apparentTemperatureMin1
windSpeed1
moonPhase1
iconpartly-cloudy-night
cloudCover1
humidity1

8.2.4. Models Comparison

  • Despite of poor results, Lasso model (with 7 predictors) has the best results among all above mentioned models. Therefore, rest of analysis is based on variables in this model.

  • Theoretical model includes 5 predictors among which 4 is in common with Lasso model.

8.3. Model improvements

8.3.1. Residual plot analysis

To understand and improve our selected regression model we perform an analysis on the residual plot. Here is a residual plot of a linear model using Lasso-selected variables.

As it can be seen in this model, two patterns are recognizable in this plot. We guess that the line represents days with zero arrivals. It could be interpreted as following. It is a 3-4 days sea-trip from Libya to Italy. While in some days ships arrive with large number of refugees, in other days no ship arrives.

8.3.2.Smoothing

One of the solutions to uniform our dataset is smoothing. Smoothing methods attempt to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena18.

Among smoothing methods, Simple Moving Average (SMA) is the method that we found as the best fit to our dataset. SMA is calculated based on unweighted mean of the previous n data.

As a result of smoothing, residual plot is more unified now (i.e. line pattern is less recognizable). However, RMSE is still not satisfactory: 432.225.

Conclusion

High RMSE could be either because of our biased model or due to lack of main predictors in our data. In the next step we apply a non-linear approach in order to decrease bias. Although reduction in bias may also lead to an increase in variance.

8.4. Non-linear Model

In Generalized Additive Model (GAM) which is an extension to linear models, predictors depend linearly on smooth functions of some variables. We selected GAM because of the following reasons:

  • We can fit a non-linear model and potentially benefit from more accurate prediction.

  • Our model is still interpretable. Because the model is additive, we can still examine the e???ect of each variable on the number of arrivals individually while holding all of the other variables ???xed. Hence GAM has the advantage of making inference about predictors.

We applied GAM on the variables chosen by Lasso-model and the following is the summary of our model.


Call:
lm(formula = sm.Arrivals ~ ns(apparentTemperatureMin1, 3) + ns(cloudCover1, 
    2) + ns(windSpeed1, 2) + ns(moonPhase1, 3) + visibility1 + 
    iconpartlyCloudyNight + humidity1, data = libya.data.with.dummy2[, 
    lasso.variables])

Residuals:
    Min      1Q  Median      3Q     Max 
-677.83 -255.91  -57.61  182.87 2106.20 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       439.06     139.30   3.152 0.001785 ** 
ns(apparentTemperatureMin1, 3)1   517.30      97.61   5.300 2.25e-07 ***
ns(apparentTemperatureMin1, 3)2   324.65     229.59   1.414 0.158378    
ns(apparentTemperatureMin1, 3)3   425.61      99.55   4.275 2.57e-05 ***
ns(cloudCover1, 2)1              -691.67     129.49  -5.342 1.82e-07 ***
ns(cloudCover1, 2)2               -60.99     112.60  -0.542 0.588498    
ns(windSpeed1, 2)1                122.78     155.77   0.788 0.431183    
ns(windSpeed1, 2)2               -346.09      84.96  -4.074 5.92e-05 ***
ns(moonPhase1, 3)1                337.05     100.45   3.355 0.000894 ***
ns(moonPhase1, 3)2               -128.51     193.98  -0.663 0.508154    
ns(moonPhase1, 3)3                153.39      82.72   1.854 0.064666 .  
visibility1                       125.10      24.52   5.102 5.95e-07 ***
iconpartlyCloudyNight             -58.49      56.55  -1.034 0.301835    
humidity1                          43.51      26.60   1.635 0.103013    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 411.9 on 301 degrees of freedom
Multiple R-squared:  0.4082,    Adjusted R-squared:  0.3826 
F-statistic: 15.97 on 13 and 301 DF,  p-value: < 2.2e-16

This model causes a noticable improvement in RMSE 402.657

9. Conclusion

Over the course of this project we investigated the main weather parameters which may influence the inflows of refugees to Europe. Our main goal was understanding weather variables that makes impacts on the number of arrivals. Therefore, we focused on linear model because of its simpilicity and interpretability.

This process includes a comparison among different simple linear models created based on various methods of variable-selection, among which Lasso model shows slightly better results than others. This model was signifacantly improved as a result of smoothing. We also extended our linear model by application of Generalized Additive Model. This model represents slightly better RMSE than previous models.

As a reslut of this project we gained a better understanding of UNHCR refugee arrival data and its quality. One of the main features of this data is existing a large number of zeros which was considered as valid data points. We address this challange by using smoothing methods.

Our analysis shows a strong relationship between the number of arrivals to Italy and some of weather parameters including minimum apparant temperature, moon phase, wind speed, visibili, and cloud cover (p-value: < 2.2e-16). However, in its current form, this model is not robust enough in prediction (RMSE 402.657).

This model is made based on weather conditions in Tripoli,Libya with assumption of 3-day sea trip. To improve this model we need to gather data from different coastal migrant hubs for different lenghts of travel. We can also apply more sophisticated methods such as Random Forest for prediction purposes.

References


  1. Williams, Zoe. Poor Weather Means a Discount: the Refugees Risking Their Lives at Sea. The Guardian, Guardian News and Media, 14 Dec. 2015, https://www.theguardian.com/society/charity-appeal-2015-blog/2015/dec/14/guardian-charity-appeal-poor-weather-discount-the-refugees-risking-their-lives-at-sea.

  2. The Independent. The Independent, Independent Digital News and Media, http://www.independent.co.uk/news/world/middle-east/refugee-crisis-smugglers-offer-seasonal-discounts-to-syrian-refugees-as-oceans-turn-wild-a6698366.html.

  3. Felicity Capon . Newsweek Europe. Newsweek, Newsweek, 25 Sept. 2015, http://europe.newsweek.com/why-conditions-are-about-get-lot-worse-europes-refugees-333641.

  4. Heavy Weather and Rough Water Handling - Boating Victoria - SportsTG. SportsTG, http://websites.sportstg.com/assoc_page.cgi?client=0-10083-0-0-0&sid=303276&&news_task=detail&articleid=25742685.

  5. Heavy Weather and Rough Water Handling - Boating Victoria - SportsTG. SportsTG, http://websites.sportstg.com/assoc_page.cgi?client=0-10083-0-0-0&sid=303276&&news_task=detail&articleid=25742685.

  6. Storm. Wikipedia, Wikimedia Foundation, https://en.wikipedia.org/wiki/storm.

  7. The Average Wind Speed During a Thunderstorm | The Classroom | Synonym. The Average Wind Speed During a Thunderstorm | The Classroom | Synonym, http://classroom.synonym.com/average-wind-speed-during-thunderstorm-24075.html.

  8. Borger, Julian et al. Winter Is Coming: the New Crisis for Refugees in Europe. The Guardian, Guardian News and Media, 2 Nov. 2015, https://www.theguardian.com/world/2015/nov/02/winter-is-coming-the-new-crisis-for-refugees-in-europe.

  9. Williams, Zoe. Poor Weather Means a Discount: the Refugees Risking Their Lives at Sea. The Guardian, Guardian News and Media, 14 Dec. 2015, https://www.theguardian.com/society/charity-appeal-2015-blog/2015/dec/14/guardian-charity-appeal-poor-weather-discount-the-refugees-risking-their-lives-at-sea.

  10. Http://www.ibtimes.co.uk/reporters/gianluca-mezzofiore. Refugee Crisis: Smugglers Offer ‘Bad Weather Discount’ to Migrants Willing to Make Winter Mediterranean Crossing. International Business Times RSS, 30 Oct. 2015, www.ibtimes.co.uk/refugee-crisis-smugglers-offer-discount-refugees-willing-cross-bad-weather-1526441.

  11. News, BBC. Migrant Crisis: EU-Turkey Deal Is ‘Working’. BBC News, https://www.bbc.com/news/world-europe-36121083.

  12. United Nations High Commissioner for Refugees (UNHCR). UNHCR Refugees/Migrants Emergency Response - Mediterranean. UNHCR Refugees/Migrants Emergency Response - Mediterranean, http://data.unhcr.org/mediterranean/country.php?id=83.

  13. Documentation Overview. Dark Sky API: https://darksky.net/dev/docs.

  14. Refugee Routes: The New Deadly Paths to Europe. ZEIT ONLINE, ZEIT ONLINENachrichten Auf ZEIT ONLINE, http://www.zeit.de/politik/ausland/2016-04/refugees-routes-europe-mediterranean-sea

  15. 26 Migrants Die off Libya Coast, Official Says - Vanguard News. Vanguard News, 23 July 2016, http://www.vanguardngr.com/2016/07/26-migrants-die-off-libya-coast-official-says

  16. What s Behind the Surge in Refugees Crossing the Mediterranean Sea. The NewYorkTimes, 21 May 2015, http://www.nytimes.com/interactive/2015/04/20/world/europe/surge-in-refugees-crossing-the-mediterranean-sea-maps.html

  17. Two Refugee Boats Capsize in 24 Hours off Libya Coast. - AJE News, http://www.aljazeera.com/news/2016/05/refugee-boat-carrying-capsizes-libya-coast-160526045202245.html

  18. Smoothing. Wikipedia, Wikimedia Foundation, https://en.wikipedia.org/wiki/smoothing.

Parisa Zahedi, Jasper Ginn & Arvid Halma

GitHub repository

Contact us!

Leiden University Centre for Innovation

2016-11-18